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Abstract 

Entropy-like functionals on operator algebras have been studied since the pioneering work of von Neumann, 
Umegaki, Lindblad, and Lieb. The most well-known are the von Neumann entropy I(p) := — trace(,o log p) and a 
generalization of the Kullback-Leibler distance |cr) := trace(plogp — plogc), refered to as quantum relative 
entropy and used to quantify distance between states of a quantum system. The purpose of this paper is to explore 
I and § as regularizing functionals in seeking solutions to multi-variable and multi-dimensional moment problems. 
It will be shown that extrema can be effectively constructed via a suitable homotopy. The homotopy approach 
leads naturally to a further generalization and a description of all the solutions to such moment problems. This is 
accomplished by a renormalization of a Riemannian metric induced by entropy functionals. As an application we 
discuss the inverse problem of describing power spectra which are consistent with second-order statistics, which 
has been the main motivation behind the present work. 

Index Terms 

Moment problem, spectral analysis, covariance matching, multi-variable, multi-dimensional, quantum entropy. 



I. Introduction 

HE quantum relative entropy (Umegaki [72]) 

§(p||cr) := trace(plogp — plogcx) 

where p, a are positive Hermitian matrices (or operators) with trace equal to one, generalizes the Kullback- 
Leibler relative entropy [43], just as the von Neumann entropy [55] 

I(p) := — trace(plogp) 

generalizes the classical Shannon entropy. They both inherit a rather rich structure from their scalar coun- 
terparts and in particular, S(-||-) is jointly convex in its arguments as shown by Lieb [48] in 1973, whereas 
!(•) is concave. The relative entropy originates in the quest to quantify the difficulty in discriminating 
odi \ between probability distributions and can be thought as a distance between such. Its matricial counterpart 
§ can similarly be used to quantify distances between positive matrices. 

Entropy and relative entropy have played a central role in thermodynamics in enumerating states 
consistent with data and, thereby, used to identify "the most likely" ones among all possible alternatives. 
The measurement of a physical property in a classical setting is modeled via ensemble averaging (e.g., 
see [41, Chapter 3]) 

r = ^2g(k)p(k) 
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where k runs over all micro-states corresponding to a scalar value g(k). Each micro-state occurs with 
probability p{k) and r is a moment of the underlying probability distribution. Similarly, quantum mea- 
surement is also modeled by averaging (as originally idealized by von Neumann, see e.g., [71, Chapter 
5], [35, page 183]): 

Pafter = G (k) p hciorc G (k)* 

k 

where the p's represent density matrices (positive Hermitian with trace one), the G's represent products 
of projection operators, and "*" denotes "conjugate-transpose". Similar expressions arise for the density 
operator when restricted to a subsystem (partial trace [71, page 185]) and also when measuring "non- 
selfadjoint observables" (e.g., [75]). If the underlying Hilbert space is infinite dimensional then the 
measurement process can be modeled with a continuous analogue of the above where the summation 
is replaced by an integral (e.g., see [8]). These are instances of moment problems. More generally we 
may consider 

R = ^2G h)[t (k)p(k)G tm (k) (1) 

k 

where p{k) are Hermitian positive matrices as well as its "continuous" counterpart 

R= ^G Mt (6)p(6)G Tight (e)d6 (2) 

where p{6) represents a Hermitian-valued positive (density) function on a support set S C M fc (k > 
1) and Gieft,G r ight are matrix- valued functions on S. If the underlying distribution is not absolutely 
continuous then we write R = f s GM t (0)dp(9)G T i g h t (0) instead, where dp is such a positive Hermitian- 
valued measure. 

The moment problem (U]|2l) is typified by spectral analysis based on second-order statistics, especially 
in the context of sensor arrays and of polarimetric radar. The echo at different polarizations and/or at 
different wavelengths is being sampled at a variety of sensor locations. It is usually the case that these 
samples are not independent and that the echo at different frequencies, polarizations etc., affects each 
sensor by a different amount. Attributes of the scattering field (e.g., reflectivity at different wavelengths 
and polarization) and the relative position of the array elements with respect to the scattering field are 
responsible for the variations in the vectorial echo. The vector of attributes can be thought of as a vectorial 
input u(9) to the array while the relative position and characteristics of its elements specify a ni oft x m 
transfer function matrix 

GWt(#) = : 

-■S^ieftJeft. 

to the n left sensor outputs. If the attributes u(8) are modeled as a zero-mean vectorial stochastic process, 
independent over frequencies, then 

2/ieft = / G Mt {0)du(6) 



Js 

represents the vectorial output process. Similarly, if 

Grightifi) = [fright, • • • 5V ight , right] 

is the m x n rig ht complex conjugate transpose of the transfer matrix corresponding to a second group of 
sensors, and if 

2/right = J G righ t(8)*du(9), 

designate the corresponding vector of n rig ht outputs, then the ni c f t x n rig ht correlation matrix 

R = E{yi eh y* ight } 
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gives rise to the matricial moment constraint on the spectral distribution of u given in ©. On the other 
hand can be interpreted when the power spectrum is discrete. A power density which matches the 
correlation samples aims at giving clues about the makeup of the scattering field. 

We address the moment problem in the above generality and provide a way to answer the following: 

(i) does there exist a density function satisfying {T||2j)? 

(ii) if yes, describe all density functions consistent with {T||2|. 

In essence, the above questions go back to 1980 when Dickinson [16] raised the issue of consistency 
of two-dimensional Markovian estimates. Consistency of scalar distributions was taken up in the work 
of Lang and McClellan [44] and Lewis [47]. Both references used entropy functionals and suggested 
computational solutions to the first question when dealing with scalar distributions. The present work 
follows up in the footsteps of these as well as, of a rather extensive literature on inverse problems [67], 
[68], [74], [45], [52], [15], [22], [46], [40] having roots in the early days of statistical mechanics. The key 
idea has been to seek extrema of entropy functionals — existence would guarantee solvability of the moment 
problem. The idea of using "weighted" entropy functionals to parametrize solutions originates in Byrnes, 
Gusev and Lindquist [13]. It was followed up in [11], [12] and in [9], [31] where it was reformulated using 
the Kullback-Leibler distance between sought solutions and positive "priors." Exploring the connection 
with the Kullback-Leibler distance, [31], [9], [14] and more resently [30], studied the scalar moment 
problem in various levels of generality. 

Classical moment problems [1], [42] are closely related to analytic interpolation ones, and as such, 
have been studied in great generality, including their matrix-valued counterpart (see e.g., [62]). However, 
analytic interpolation applies only when the integration kernels possess a very particular shift-structure 
similar to that of a Fourier vector (see e.g., [4], [18] and also [28], [29]), and is of limited use in the 
generality sought herein. On the other hand, literature on interpolation with a "complexity constraint" is 
relevant since it departs from the groove of the classical theory. Two works are especially relevant, [23] and 
more recently [6]. In [23] a homotopy was suggested in the context of the matricial trigonometric moment 
problem. Then [6] used optimization of an entropy functional in the context of matricial Nevanlinna-Pick 
interpolation. Neither applies in the generality sought herein, yet, below, we build on both of these general 
directions. 

The present work follows up along the lines of [31] where it was suggested that the quantum relative 
entropy may be used in the multi- variable case. In fact, we carry out the plan suggested in [31] for multi- 
variable as well as multi-dimensional distributions and we develop a computational approach analogous 
to one presented in [30] for scalar distributions. 

In view of a rather rich literature on quantum entropies, a comment is in order as to other possible 
connections to the present work. Besides the Umegaki-von Neumann entropy S(-||-) studied in this paper, 
there is a plethora of alternatives due to a dichotomy between matricial and scalar distributions [58], [57]. 
In particular Araki's theory [3], [49] helped characterize a family of "quasi-entropies," contractive under 
stochastic maps. References [59], [60], [49] in particular explore the Riemannian geometry they induce 
on density matrices. It is an interesting question as to which among this "garden of entropies," besides the 
Umegaki-von Neumann one, allows a convenient representation of solutions for general moment problems. 
The approach we have taken leads us to work mostly with an induced metric (a Jacobian related to the 
Hessian of §(-||-)). A suitable normalization then recovers any solution of the moment problem as a 
corresponding extremal. It is not known whether, the "weighted metrics" e.g., Vh w in Section |IVJ are 
metrics induced by a quasi-entropy in the language of Petz [59]. 

Finally, it is interesting to point out that, a counterpart for discrete distributions relates to the theory 
of analytic centers in semi-definite programming [7], [54]. In fact, a key construction in this paper — a 
homotopy for the numerical computation of solutions, is analogous to tracing paths of analytic centers in 
interior point methods. 

In Section [H] with discuss three motivating examples. Section [in] develops the geometry of matricial 
cones and the significance of relative entropy as a barrier functional. In order to simplify the exposition, we 
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first deal with cones of constant matrices and then with those of matrix- valued density functions. Except 
for technical differences, the two run in parallel. Finally, Section |IV] discusses the parametrization of 
solutions to the moment problem, followed by concluding remarks (Section |V|) and an appendix (Section 
fVTb with useful facts on matricial calculus. 

II. Motivating Examples 

In this section we present four examples that motivate our study. For simplicity, the first two are 
developed in the context of scalar distributions, the third is intrinsically multi-variable. The fourth example 
pertains to the connection of the moment problem with analytic interpolation. We return to the first and 
fourth example again later on in the paper. 

A. Non-equispaced arrays 

Consider an array of sensors with three elements, linearly spaced at distances 1 and v2 wavelengths 
from one another, and assume that (monochromatic) planar waves, originating from afar, impinge upon 
the array. This is exemplified in Figure [l] 

E Ei E 2 

o o o 




Assuming that the sensors are sensitive to disturbances originating over one side of the array, with 
sensitivity independent of direction, the signal at the £th sensor is typically represented as a superposition 

/*7T 

u e (t) = / A(9)e j{wt - pXiCOsie)+m) d9, 



of waves arising from all spatial directions 9 6 [0, 7r], where uj is as usual the angular time-frequency (as 
opposed to "spatial"), xi the distance between the Eth and the 0th sensor, p the wavenumber, and A(9)d9 
the amplitude and <p{9) a random phase of the ^-component. Typically, (f)(9) for various values of 9 are 
uncorrected. The term pxecos(9) in the exponent accounts for the phase difference between reception at 
different sensors. For simplicity we assume that p = 1 in appropriate units. Correlating the sensor outputs 
we obtain n 

R k = E{u h u i2 } := / e~i kcos Wf{9)d9 
Jo 
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where f{9) 
1}. Thus, 



|A(6 I )| 2 represents power density, and k — l\ —£ 2 with 

kel:= {0,1,^,^+1}. 



> £ 2 and belonging to {0, 1, a/2 + 

(3) 



The only significance of our selection of distances between sensors, that gave rise to the indexing set 
©, is to underscore that there is no algebraic dependence between the elements of the so-called array 
manifold 

G{9) := [l e~i T e~^ T e~^ +1 >]\ 

(where "[•]'" denotes the transpose, G is thought of as a column vector, and r = cos(#) G [—1, 1]). 

Given a set of values Rk for k 6 X, it is often important to determine whether they are indeed the 
moments of a power density f(9), and if so to characterize all consistent power spectra. The case of 
arrays with equispaced elements is very special and answers to such questions relate to the non-negativity 
of a Toeplitz matrix formed out of the i?fc's. In the present situation nonnegativity of 



jV2T 



[1 



jV2- 



t] dr 



which, in the obvious indexing turns out to be 
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(4) 



is only a necessary condition. The fact that it is not sufficient (see e.g., [25, page 786]) motivated the 
present study. 



B. Two-dimensional distributions 

The subject matter of multi-dimensional distributions received considerable attention in the 1970's and 
early 1980's (e.g., see [50]). However, despite the rich theory of analytic interpolation and orthogonal 
polynomials in more than one variable, etc. (e.g., see [37], [64]), as pointed out by Brad Dickinson [16], 
the analog of the questions raised earlier never received a definitive answer. 

For instance, consider the simplest possible planar grid 

2 := {k,£eZ : < k < n, < t < n} 

which is both regular and square. In fact, we may even assume that n = 2. Then, consider the case where 
(monochromatic as before) waves impinge upon sensors at the grid nodes, the sensors are sensitive to 
reception on one side of their plane, the waves are planar, originating from all possible directions in the 
sky and uncorrelated over different directions, and that correlations at the sensor outputs are taken. The 
array manifold in this case becomes (after suitable normalization assumptions) the 3x3 array 

with (9, (p) E [0, 7r] x [0, 7r]. This is quite standard (e.g., [39], [73], [34]). Now, if p(9, <p) denotes a positive 
scalar power density for the waves originating from (normalized) Euler angles 9, <p then, correlation 
between the sensor outputs gives us a 3 x 3 matrix of covariance samples 

R k ,i:= ^ r e Kke+ ^ ) p(9, ( f ) )d9d4 > 
Jo Jo 

The same two questions that we raised earlier are again relevant: Given a 3 x 3 array, how can we tell that 
it originates as suggested above, and how can we characterize all power densities that are be consistent 
with the covariance samples Rh,p- 



k.J=0 
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Lang and McClellan [44] considered "maximum entropy spectra" and suggested evaluating those on a 
discrete grid 1 . Lewis [47] considered a framework which is applicable for answering the existence question, 
while [30] discussed parametrization of solutions as well. The present contributions allows addressing the 
most general situation where the sensor array elements receive vectorial echoes and hence, Rk,i as well 
as p(6, 4>) are matrices. 



C. Quantum measurements 

We temporarily adopt the language of quantum mechanics as in, e.g., [56], and explain how this relates 
to the linear algebraic framework in the introduction. We then discuss a basic academic paradigm which 
exemplifies the setting of the matricial moment problem. 

Let p AB denote the density matrix of a quantum system composed of two subsystems A and B. Each 
subsystem can be in two states |0) and |1), respectively. These can be thought of as the vectors 



and 



in 



Then, the states of the combined system, \i) ® \j) where i,j G {0, 1}, can be represented by a vector in 
C 4 . For instance, |0) (g> |1) corresponds to 

M 



i 





where in the last equation <g> is the Kronecker tensor product. Accordingly, p AB is a Hermitian 4x4 trace 
one matrix formed out of sums of Kronecker products of 2 x 2 density matrices of the two subsystem. 
Now, if p A represents the density matrix of subsystem A, then this can be obtained from p AB via taking 
the partial trace with respect to subsystem B (e.g., see [56, pages 106-107]). An important example is 
that of the Bell state with 



|01) + |11>\ /(00| + (11 



V2 



V2 





(1 








1\ 


1 














2 
















V 








V 



The partial trace then gives (see [56, pages 106-107]) 

p A = traces (p AB ) 

where I 2 is the 2x2 identity matrix. If p AB is represented as 4 x 4 matrix as above, then 



h 2 

2 2 



9' 



2 



/ 4 G k p AB G\ 



k=i 



where 



(1 0) 



10 1 
10 



and similarly, C7 2 = h <E> (0 l) . The reverse task of characterizing all possible p AB based on a known 
p A is a (discrete) moment problem. 

'Lang and McClellan [44] were the first to point out that existence cannot be guaranteed for the usual "entropy rate" functional J s log(p)d9 
unless the dimension of S is one, cf. Theorem |4] below. 
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D. State-statistics and analytic interpolation with degree constraint 

Consider the linear discrete-time state equations 

%k = Axk-i + Bv,k, for k G Z, (5) 

where x k G C n , u k G C m , A G C nxn , B G C nxm , (A, S) is a controllable pair, and the eigenvalues of 
A lie in the open unit disk of the complex plane. Let {u k : k G Z} be a zero-mean stationary stochastic 
process with power spectrum the non-negative matrix-valued measure dfi(9) on 9 G (— 7r, 7r]. Then, under 
stationarity conditions, the state covariance 

i? := S{xfcxJ} 

can be expressed in the form of the integral (cf. [51, Ch. 6]) 

R = r (g^^&g^y) (6 ) 



2tt 
where 

G{z) := (I-zA)- 1 B 

is the transfer function of system ©. Note that we use z to denote the transform of the delay operator 
and therefore G(z) is analytic in the unit disc of the complex plane. 

It turns out that state covariances R are characterized by the following two equivalent conditions (see 
[28], [29]) 

" R - ARA* B 
B* 



rank 
and, 



2m (7) 



R - ARA* = BH + H*B* for some H G C mxn . (8) 

Then, power spectral measures consistent with © are in correspondence with matrix valued functions 
F(z) on the unit circle D := {z G C : \z\ < 1} which have nonnegative real part via the Herglotz 
representation 

with jc an arbitrary skew-Hermitian constant. The measure dji can be recovered as the weak* limit of 
the real part of F(z) as z tends to the boundary, i.e., 

dale) ~ ]im$t(F(re je )). (10) 

r /I 

The class of nonnegative real matrix valued functions F giving rise to admissible power spectral measures 
are also characterized by the interpolation condition ([28]) 

F(z) = H(I-zA)- 1 B + Q(z)V(z) (11) 

where Q is a matrix function analytic in D, 

V(z) :=D + zC{I - zA)~ 1 B (12) 

and C G C mxn , D G C mxm are selected so that V is inner, i.e., V(0*V(f) = V(£)V(£)* = I for all 

iei = i. 

The data A, B, H and V(z) in equation (fTTT) specify an analytic interpolation problem. Positive-real 
solutions to (fTTTl can be given via © and solutions to the moment problem ©. The characterization of 
solutions to © given in Theorem |6] allows a non-classical characterization of solutions to (fTTTl and in 
particular a characterization of solutions of McMillan degree less than or equal to the dimension of © 
(see Section ITV-B1) . 
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III. Matricial distributions and their moments 

The moment conditions ((T]|21) are linear constraints on densities pt (A; = 1,2,.. .) and p(9) (9 G S), 
respectively. Density functions, whether discrete or continuous, are non-negative, or non-negative definite 
in the matricial case, for each value of their indexing set. Thus they have the structure of a cone. Entropy 
functionals on the other hand represent natural barriers on such positive cones and can be used to identify, 
and even parametrize, density functions which are consistent with given moment conditions. We begin 
by explaining the geometry of the moment problem for constant density matrices and the relevance of 
entropy functionals in obtaining solutions as their respective extrema. Both, the geometry of cones of 
matricial densities functions as well as the role of entropy functionals is quite similar and is taken up in 
Section UTTBl 

A. Relative entropy and the geometry of matricial cones 

We begin by focusing on constraints 

R = ^2 G ^t(k)pG right (k) 

k 

where p is not indexed. The general case is quite similar. 
We use the notation 

M := {MeC mxm : M = M*}, 
97? := {M G M and M > 0}, 
97T+ := {M G M and M > 0} 

to denote the space of Hermitian matrices and the cones of non-negative and positive definite ones, 
respectively. The space M is endowed with a natural inner product 

(Mi, M 2 ) := trace(M*M 2 ) = trace(MiM 2 ) 

as a linear space over R. Clearly, both, 3Jt and 97t + are convex cones. Since non-negativity of (M 1 ,M) 
for all Mi G 97? implies that M G 97?, it follows that 97? is self-dual 2 . It can also be seen that 971+ is the 
interior of 971. 

The linear operator 

L:M^V\: p^ R = Y, G lc{t (k)p G ligU (k) 

k 

where 91 C C™ lcftXnright denotes the range of L, maps 971 onto the cone of admissible moments K, = 
L(97T) C 91 Here, and throughout, Gi e ft, Gright are matrices of dimension n Mt x m and m x n righ t, 
respectively. A further assumption that is often needed is that the null space of L does not intersect 971, 
i.e., 

null(L) n 97t = {0}. (13) 

The interior of K, is int(/C) = L(97t + ) and, given R, the moment problem requires testing whether R G K 
and if so, characterizing all p G 97T such that R = L(p). 
Geometry in the range space 9^ is based on 

(A, R) := 3?e (trace(A* J R)) , for A, R G 9t (14) 

Then the adjoint transformation of L is 

L*:9i^M:A^p=(^ G left (k)*X G vight (k)* J 

\ k / Herm 

2 In general, the dual cone 9Tt dual is the set of elements forming an "acute angle" with all elements of the original cone, i.e., {M : 
(M,Mi) > 0, VA/i £ 9JI} (see [42]). 
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where (M) He rm := | (M + M*) is the "Hermitian part". The dual cone of JC, 

^dual . = | A G ^ . ^ R j > o, \/R G /C}, 

is naturally related to the cone 971 C M. In fact, using (X,L(p)) = (L*(X),p) it follows easily that 

/C dual = {\e<R : L*(X) G 971}. 

The interior of the dual cone 

/C dual := int(/C dual ) := {A : (A, R) > 0, Vi? G K - {0}} 

corresponds to 9Jt + as is easily seen to satisfy 

£ + ual = {A : L*(A) G 971+}. 

Finally, (fT3l) can be seen to be equivalent to /C dual 7^ 0. 

1) Minimizers of §(I||p):: We are interested in minimizers of (the negative entropy) 

§(/||p) = -trace(logO)) 

on 97T+ subject to R = L(p). Here and throughout, "J" denotes the identity matrix of size determined 
from the context. When such a minimizer exists at an interior point of 9JIr i+ , stationarity conditions for 
the entropy functional dictate an explicit form for the minimizer (which, is unique due to the convexity 
of — trace(log(p))). 

The Lagrangian of the problem is 

£(A, p) := trace(- log(p)) - (A, R - L{p)). 

Using the expression for the derivative of the logarithm given in Proposition |8] of the appendix, the 
(Gateaux) derivative of C in the direction 5 G M becomes 

dC(X, p;5) := trace(-M p - 1 5) + (A, L(5)) 
= trace(-p- 1 5) + (L*(\),6). 

In the above derivation, the "trace" is what allows replacing the "non-commutative division operator" 
M~ l (cf. (1451) ) with multiplication by p~ l . The stationarity condition dC(\,p; S) =0 then gives 

p = (L*(A))-\ (15) 

Thus, a necessary condition is that there exist A G /C dual such that L*(\) is strictly positive, i.e., that /C dual 
is nonempty. It turns out that if R G int(/C) then this condition is also sufficient. 

Theorem 1: Assume that R g int(/C). Then the entropy functional S(I\\p) has a minimum in 9Jl R>+ , 
which is also unique, if and only if /C dual is nonempty. 

Proof: Necessity is obvious and was argued above. Uniqueness follows from the matrix convexity of 
log(-) (which follows e.g., from a positive Hessian d44l given in the appendix). Sufficiency requires that 
there exists A x G /C dual such that R = L((L*(Ai)) _1 ). Then p 1 = (L*(Ai)) _1 satisfies both the stationarity 
conditions and the contstraints. We show this via a continuity argument which we will adopt again later 
on to more general cases. 

Consider the mapping 

h : K* + ^ int(/C) C 9i 
: A^LUL^A))- 1 ). 

Its Jacobian 

Vh\ x : <K^9i : 5 i-> L(L*(A)- 1 L*(5)L*(A)- 1 ). 
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is Hermitian since 

(5i,Vh(5)) = (L*(5 1 ),L*(X)- 1 L*(5)L*(X)- 1 ) 
= (L*(\)- 1 L*(5 1 )L*(X)-\L*(5)) 
= (Vh(Sx),S). 

The map h is a local diffeomorphism and V/i can be used to relate locally a smooth path in the space of 
R's to one in the space of A's (of course, both spaces being the same space 
Choose a A G K\, let 

Po = (L*(A ))- 1 , 

and let R = L(p ). Consider the interval path R T — (1 — r)R + tR, for r G [0, 1]. Since i? G int(/C), 
so is the whole path R T (r G [0, 1]). We claim that for all r G [0, 1] there exists A r G /C^ 1 such that 
i? r = L(L*(X T )^ 1 ). It is clear that this holds locally and that A r satisfies 

^X T = (Vh\ XT )- 1 (R-R ), (16) 

since jpR T — R — Ro- The starting point is Ao and (IToT) can be integrated over a maximal interval [0, e) 
for which A r G /C dual . Throughout 

R T = L{L*{\ T )- X ). 

If e > 1, this proves our claim. If e < 1, then either ||A r || — > oo as r — > e, or the A r 's have a limit point 
A e on the boundary of /Cl ual , i.e., such that £*(A e ) is singular. Below we argue that neither is possible, 
which then shows that e > 1 and completes the proof. 

We first show that A T remains bounded. Assume to the contrary, i.e., assume that ||A r || grows unbounded 
as r — > e, and let i T := A r /||A T || (where ||A|| = a/ (A, A) as usual). Since R e G int(/C), it holds that 
R e = L(p) with p > 0, and (£ T , R) is bounded away from zero for elements i T G /C dual of unit norm. 
However, because 

(X T ,Rr) = (L*(A r ),L*(A T )- 1 ) =trace(J), 

it follows that {£ T , R T ) = trace(/)/||A T || has zero as a limit point when r G [0, e); hence, so does (£ T , R € ). 
But this is a contradiction, hence A r remains bounded. 

We finally show that L*(A r ) _1 and Vh along with its inverse remain bounded. Consider the quadratic 
form 

(6,Vh Xr (S)) = \\L*{\ T )- x / 2 L*{5)L*{K)- 1/2 \\\ (17) 

for 5 G 9^. Since A r (and hence L*(A r )) remains bounded, the quadratic form is bounded away from 
zero when r G [0,e). Hence, V/il^ 1 is uniformly bounded on [0, e). On the other hand, because of (TT31) . 
the minimal angle between any ray in the cone 071 and range(L*) is bounded away from ix/2. Hence, 
||-Rr|| > Q; II-^*(^t)~ 1 || 5 for some a > 0. But R T remains bounded. We conclude that L*(A T ) -1 remains 
bounded and that Vh remains bounded as well. This completes the proof. ■ 
2) Minimizers of S(p||/):: We now focus on minimizers of 

§011/) = trace (plog(p)) 

in subject to R = L(p). The Lagrangian this time is 

C(X,p) := trace(plog(p)) - (A, R - L(p)). 

Once again, using the expression for the differential of the logarithm given in Proposition[E]of the appendix, 
the (Gateaux) derivative of C in the direction 5 G M becomes 

dC(X,p;6) := trace(51og(p)+pM p " 1 (5)) + (A,L(5)) 
= trace(5 log(p) + 5) + (L* (A) , 5) . 
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The last step follows from 

poo 

trace(pM / ; 1 (5)) = trace(p / (p + ty 1 5(p + ty l dt) 

Jo 

l>CO 

= trace( / (p + t)~ l p(p + 5) = trace(5). 

Jo 

The stationarity condition dC(X,p; 5) = then gives that 

p = exp (-J - L*(A)) = - exp (-L*(A)) (18) 

with L*(A) G M (and not necessarily in Wl as before). It turns out that if R G int(/C) a minimizer can 
always be found. It should be noted that (fT3t is no longer a necessary condition. 

Theorem 2: If i? g int(/C), then the entropy functional §(p||J) has a minimum in Wl R , + which is 
unique and of the form f!8)>. 

Proof: We use a similar continuity argument as before. Consider the mapping 

k '. D\ — ► int(/C) C <K (19) 
: A i-> L 0exp(-L*(A))^ . 

Its Jacobian 

Vfc| A : <K^!K : 5^-L(M exp( _ x * (A)) (-L*(5))), (20) 
with Mc as in (l43t . is Hermitian and negative definite. This is because 

(5 u Vk\ x (8)) = -^(L*(5 1 ),M cxp{ . L , W) (L*(5))) 



■-trace{L*{5 l ) [ e -^ L ' ^ L\5)e~ tL * {x) dt) 
e Jo 

-trace( /' e^^L*^)^ 1 -'^ d*L*(<y)) 
e Jo 



(v«u(5i),(j), 



while 



(V5,«| A ((J)) 



1 Z" 1 

- / trace(A5L*(5)A-5AA-5L*(5)A5)dt 
e Jo 



with A = exp(— L*(A)) G 9Jt+. Then (V5, «|a(^)) is clearly negative unless 5 = 0. Thus, the map k is a 
local diffeomorphism and Vk can be used to relate locally a smooth path in the space of i?'s to one in 
the space of A's. 

Begin with A in 9^, R = L(± exp(-L*(A )), and R T = (1 - r)i?o + ^i? for r G [0,1]. Since 
_R , -R £ int(/C) then i? r G int(/C) for all r G [0, 1]. We claim that 

^-K = {VK\ Kr )- 1 (R-R ), (21) 
dr 

can be integrated over [0, 1] with A r staying bounded. Then, by construction, A r satisfies both 

R T = L(iexp(-L*(A T )) (22) 

e 

as well as the stationarity conditions for each r. Hence, 

p = -exp(-L*(Ai) 
e 
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is a minimizer for S(p||7) as claimed in the proposition. 

Clearly d2"TT) can be integrated over [0, e). If e > 1, we are done. Thus we only need to show that e < 1 
and 

||A r || — ► oo as t — > e 

lead to a contradition. To this end, we use two facts, first that (|22fr holds on [0, e) and then, that (£, _R e ) > 
for all £ G /C dual , since e int(/C). 

Define £ r = A T /||A T || and let £ e be a limit point of a convergent subsequence £ n with — > e (which 
exists since the £ T 's are bounded). We claim that L*(£ e ) > and singular. To see this first note that, if 

spectrum(L*(4)) n (-oo,0) ^ 0, 

then spectrum (L*(£ n )) fl (— oo,0) ^ as well, for sufficiently large i. But if this is so, then 

L(i e - L *^)) = L(I e -^)IIM) (23) 
e e 

grows without bound instead of tending to i? e . Therefore, 

spectrum(L*(4)) C [0, oo). 

Now, if L*(£ e ) is nonsingular then for all i sufficiently large L*(£ T .) is nonsingular as well. But then, d23t 
tends to zero as i — > oo. Hence, € spectrum(L*(4)) C [0, oo). 

Now let U be an isometry (U*U = I) whose columns span the range of L*(£ e ) and consider 

(£e,R Tl ) = -(L\Q,e- L ^ x ^) = -(L%Q,e~ L '^ x ^ 
e e 

= -(U*L*(QU,e- u * L *^)mn\\y 
e 

Since 

U*L*(i Ti )U -> U*L*{QU > 

while ||A Tj || — > oo we conclude that (£ e , _R T .) — >■ as r — ► oo. Since, i? Ti — > R e it follows that (£ e , R e ) = 
which contradicts the hypothesis that R e G int(/C). ■ 



5. Relative entropy and matricial distributions 

The geometry of convex cones and of the moment problem when p is a matricial density function on 
a compact set S, as in (EH), is quite similar to the case where p is only a positive matrix as in Section 
IIII-AI Appropriate generalizations of the relative entropy functionals allow computable expressions for 
the corresponding extrema when S is a closed interval of the real line, or even a multi-dimensional closed 
interval in ~R k (k > 1). We develop this theory focusing on 0. 

We consider Hermitian m x m matrix-valued measurable functions on S as a linear space over R with 
an inner product 

(m,!, 777,2) = J tr&ce(mi(6)m 2 (9))d9. 

We use the notation M to denote the Hilbert space of square integrable elements, and the notation Vft 
and ITI + to denote the cones of elements which are nonnegative and positive definite, respectively, for all 
6 E S. The linear operator 

L : Wl^dK : p^R = J G Mt (6)p(6)G right (6)d6 (24) 

maps lit into a subspace of C lcftxright denoted by 9^1 as before and viewed as a linear space over R. Both, 
moments R and their duals A reside in and the geometry is always based on (TBI) . For simplicity of the 
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exposition, we assume that the integration kernels Gieft, bright are continuously differentiable on S. The 
closure of the range of XXX is denoted by /C = L(XXt), while int(/C) = L(£TT+). The adjoint transformation 
is now 

L*:m^M:\^p= (G Mt (9YX G Tight (9)*) RcTm . 
It is not difficult to show that the expressions for the dual cone and its interior 

^duai = {X€1R : L*(X) G XYl}, and 
^duai = {A G 9* : L*(A) G m+} 

remain valid (except for the obvious change where XXI replaces our earlier 971). The analog of (fT3l) will 
be needed (in Theorem H|) which, can also be expressed as 

i^ ual ± 0. (25) 

Finally we define as before 

xn Rt+ -.= m+ n {p g m : r = l( p )} 

as we seek to determine whether or not XXIr )+ = 0, or equivalently, whether R G int(/C). 

For future reference we bring in a characterization of elements R G K, analogous to the scalar real case 
given in [42, page 14]. Given R G 9t, define the real-valued functional 

£ R : 9i^M 

: A^(A, i2) (26) 

Such a bounded functional is said to be nonnegative (resp., positive) — denoted by <t R > (resp., €r > 0), 
if and only if the infimum of £r(A) over A G /C^. ual of unit norm is positive (resp. nonnegative). 
Proposition 3: The following hold: 

R G K. <£> £ fl > 
i? G int(/C) ^ (T R > 0. 

Proof: We now only prove necessity, which is needed in the proof of Theorem HJ The proof of 
sufficiency will be given at the end Section IIII-B.ll 

If R G int(/C), there exists a particular p G flt + such that R = L(p). It readily follows that £ > 0. If 
R E JC, there exist an approximating sequence R4 — > R (i = 1, 2, . . .) where i?j = L(pi) and pj G XXI. If 
A G /C^ ual then 

e*(A)= lim (A, 

which is > and hence at least £ > 0. ■ 
We now turn to relative entropy functionals for matricial distributions. Given p, a G t1t + , 

S(p || cr) := J trace (p log p — p log a)d9. (27) 

Once again, minimizers of relative entropy subject to the moment constraints © take a particularly 
simple form amenable to a numerical solution via continuation methods. We follow the same plan as in 
Section ITlI-Al bv focusing successively on each of the two alternative choices, S(/||p) and then S(p||J). 
A significant departure from the case of constant densities shows up when considering the dimension of 
the support set S in the context of S(/||p). 
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1) Minimizers of $(/||p) = — J s trace(log(p))d0:: In complete analogy with constant case the 
derivative of the Lagrangian 

£(A, p) := - J trace(log(p))d# - (A, R - L(p)) 

in the direction 5 G M is 

dC(X,p;5) := trace J {-M^ + L*(X))5{d)dB 
= trace J (-piOy 1 + L*(\)) 5(6) d6, 

where, once again, the presence of the trace allows replacing the "super-operator" by multiplication 
by p(0) _1 , pointwise over S. The fundamental lemma in calculus of variations (see e.g., [5]) now gives 
the stationarity condition 

p = L*(\)-\ (28) 

In order for p e Vfl it is necessary that L*(X) is strictly positive on S. Thus, we consider the "rational" 
family of potential minimizers for S(/||p) 

m rat := {p = L*(\)-\ with A G /C^ ua1 } , 

where we seek a solution to the moment constraints ©. It turns out that if a solution exists then, a 
particular one exists in nt rat and that it can be obtained by computing the fixed point of an exponentially 
converging matrix differential equation. This differential equation is an appropriate generalization of (fT6l) . 
We summarize all these conclusions below. 

Theorem 4: If dim(«S) = 1, condition ||25} holds, and R e int(/C), then $(/||p) has a minimum in 
t\X R>+ which is unique and belongs to fl? rat . Furthermore, for any A e /C| ual , the solution X t of the 
matrix differential equation 

^A t = (V/i|A 4 )- 1 (i2-L(L*(A t )- 1 )), (29) 

where 

Vh\ Xt ■■ -> K ■■ L(L*(X t )- 1 L*(8)L*(Xt)~ 1 ), (30) 

belongs to /C[ ual for all t e [0,oo), it converges to a point A e /C+ ual as t -> oo corresponding to 
this unique minimizer p = L*(A)~ X for ffi(/||p) satisfying R = L(p). The differential equation {29J is 
exponentially convergent as the square distance V(X t ) = \\R - Zv( J L*(A t )~ 1 ) || 2 satisfies 

^ = -2V(A«). 
at 

Conversely, if R £ int(/C) and the dimension of S is one, then the differential equation iHHJ) diverges. 
Equations (l29l) is equivalent to 

^ I A T = (V/i|0" 1 (^-i2o), (3D 

modulo scaling of the integration variable (see below). The latter can be integrated over [0, 1], and then 
A = A T | r= i, yet (l29l appears preferable for numerical reasons. 

We wish to point out that the assumption on the dimension of S can be slightly relaxed to being at 
most two provided S is a torus and GWt, G ri ght doubly periodic accordingly, (cf. [44, Example 2 on page 
882], [30]). 

Proof: Once again we consider the mapping 

h : : A i-> R = L(L*(A)- X ). 
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The Jacobian Vh\x is Hermitian and invertible when A G /C dual . The proof is identical to the one given 
in Section Hll-A.il Choose A G /C4 ual , set R = L(L*(A ) _1 ), and consider the one-parameter homotopy 
of maps 

LiL^Xr)- 1 ) = R + r(R-Ro) =: R T , (32) 

for r G [0, 1]. The idea is to follow a path of solutions A T (r G [0, 1]), ensure that this is contained in 
/C^ ual , and set A = Ai which then satisfies the sought conditions (L*(A) _1 G fl? rat and L(L*(A) _1 ) = R). 

Clearly, L*(A ) _1 G m rat and #0 e int(/C). If -R G int(/C) then so is R T for r G [0, 1]. We claim that 
for all t G [0, 1] there exists A r G /C+ ual such that 

R T = LiL^Xr)- 1 ) (33) 

as before. The arguments are similar to those given in the proof of Theorem [T] and are based on the fact 
that h is a local diffeomorphism. Values A T which obey 

^ = (Vh XT )-\R-Ro), (34) 

GST 

satisfy d33t as long as the path stays in /C+ ual . We need to rule out A T crossing the boundary of /C^ ual or 
tending to oo at a r < 1. Either possibility contradicts R T G int(fT) (r G [0, 1]) in a way analogous to 
the earlier arguments in the proof of Theorem [lj 

We consider a maximal interval [0, e) over which R T G hit (if), and note that 

(X T ,R T ) = J tr&ce(I)d9 = trace(J) ■ measure(iS) < oo, 

on [o, e). If ||A r || — > oo, then £ Rr cannot be bounded away from zero since {p^,R T ) — > 0. Hence, 
R e in(/C) and e > 1. 

If A e lies on the boundary of /C dual , then L*(X € ) has a root in S. It is here that the dimension of S 
becomes important. As long as the dimension of S is one, L*(A e )~ 1 is not integrable as a function of 
9 G S and R e = L(L*(A e ) -1 ) cannot be finite. Thus again, e > 1. 

We finally express (1341 in a "feedback form". We first replace r with t = — log(l — r). In this case, 
r = 1 — e~* and t varies in [0, oo) as r varies in [0, 1]. If we denote A t := X T ( t \ and R t := R T tt), then 

= (l-r^CV/iUJ- 1 ^-^) 

= (V/iUr'GR-^). (35) 
The same substitution gives ^ = Ri — R t , and that 

V(X t ):= WR-RtW^WR-LiL^Xty^W 2 

satisfies 

^ = -2{R-L{L*{X t )-%Vh\ Xt ^X t ) 
= -2{R-R t ,R-R t ) = -2V(X t ). 

Uniqueness of the representation R = L(p) with p G m rat follows from the fact that such a p is a 
minimizer of the convex functional S(/||p) subject to the moment constraints. An alternative but equivalent 
argument can be based on the fact that h is a C 1 -mapping between open convex subsets of a Euclidean 
space, with a positive definite Jacobian everywhere. 

In the other direction, if R G" int(/C), then the path R T either crosses or at least, in case R G dK,, tends 
to the boundary of /C. In either case, X t grows unbounded and the differential equation diverges. ■ 

We now complete the proof of Proposition 
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Proof: ["sufficiency" in Proposition^] In the proof of TheoremHJ in essence, we used the positivity 
of <£ R to obtain a representation R = L(p) with p G fll + . Thus, the same line of argument gives that if 
CCr > then R G int(/C). 

If £r > but not necessarily > 0, then chose R G int(/C) and note that C r+ i Rq > for k — 1, 2, 

Obviously, i? + |i?o £ int(/C) (/c — > oo) and R is at least in /C. 3 ■ 
2) Minimizers of $(p||/) = J 5 trace (p log (p))d8:: Once again, the derivative of the Lagrangian 



£(A, P) -■=- f trace(p log(p))d0 - {X,R- L{p)) 



in the direction <5 G M is 

d£(A,p;5) := trace / (log(p) + / + L*(A))5(0)d0. 
The stationarity condition leads to the expression 



5 



p=-exp(-L*(A)), (36) 

e 

for the minimizer, except that now p is a function of 6 G S. We consider the "exponential" family 

m exp := jp = i exp(-L*(A)), with A G 91 j , 

of potential minimizers for S(p ||J), where we seek a solution to 0- The development runs in parallel 
to the case where p G fl? rat with one important difference. The "Lagrange multipliers" A no longer 
need to be restricted to /C dual and existence of solutions when R G /C can be guaranteed even when 
dim(iS) > 1. Moreover, (1251 is no longer necessary and existence of solution to the moment problem in 
nt cxp is impervious to the dimension of the dual cone /C+ Ual . 

Theorem 5: If R g int(/C) then the entropy functional $(/||p) has a minimum in tXl Rj+ which is 
unique and belongs to lTI exp . Furthermore, for any A g 91, the solution A t of 

^A t = (Vfc| At )- 1 (i?-L(A t )), (37) 

where 

Vfc| At : 91 -> 91 : 5 h-> --L(M exp( _ L , (At)) (L*(5)), (38) 

remains bounded for t e [0, oo) and converges to A e 91 as t — > oo corresponding to the unique 
minimizer p = ±exp(-L*(A)) for S(/||p) subject to R = L(p). The convergence is exponential as 

V(A t ) = - |L(exp(-L*(A t )))|| 2 satisfies = -2V(A t ). Conversely, if R & int(/C) then the 
differential equation (37J) diverges. 

Proof: The arguments are for the most part identical to those used in proving Theorem El i.e., we 
now consider 

K : 91 -f int(/C) C 91 : \ R = L(-e~ L *^), 

e 

observe that the Jacobian is negative definite for any value of A, and use it to track a linear path (1 — 
t)R + tR (r G [0, 1]) from R = L(ie~ L *( A °)) to the given R in the A-coordinates. This is done by 
integrating (TUl over [0, 1] starting from arbitrarily chosen starting point Ao- By construction, the solution 
of (I2TT) corresponds to p r = \e~ L *^ Ar - ) G tXt + which satisfies R T = L(p T ). It is clear that if R G" int(/C), 
then the differential equation diverges for r < 1 (since otherwise p r G Vfl + and R T = L(p T ) would hold 
on [0, 1], contradicting R G" int(/C)). We only need to show that if R G int(/C), then A r remains bounded 

3 A uniform bound on the integral of densities corresponding to 7?+ j:Ro can be shown. This can be used to establish a finite nonnegative 
measure d/i corresponding to R. 
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for r G [0, 1]. This yields A = A T | r= i which satisfies R = L^-e~ L '^). Then @7J can be obtained via a 
change of variables as in Theorem The same applies to deriving the differential equation for the "error" 
V(Xt). 

In order to show that, in the event R G int(/C), A r remains bounded on [0, 1] we extend the argument 
used to prove Theorem |2] to the present case where p is a matrix valued function on S. The key is 
to observe that, when (OTT) is integrated over a maximal interval [0,e), any convergent subsequence of 
£ T ■ = A T /||A r || (r G [0, e)) must have a limit point £ £ for which L*(£ e ) G fl? but not in flt + . Moreover, 
L*(£ e ) must be singular on S C S (a subset of possibly zero measure). To see this note the following. 
If L*(£ £ ) G Hl + then L*(£ Ti ) is bounded away from zero and positive for i large enough, whereas if 
L*(£ e ) ^ XXI then there is a subset of S of non-zero measure where L*(£ T .) is negative. Either way 
L(exp(— L*(A T J)) = L(exp(— L*(£ T . ■ ||A T J))) cannot tend to R e as it should. In the first instance it goes 
to zero and in the second it becomes unbounded. Thus L*(£ e ) G XXI but singular for certain values of 
9. Below we show that this implies R e G" int(/C), which then proves that e > 1 and that (l37t can be 
integrated on [0, 1]. 

To show that R e G" int(/C) it suffices to show that <L Rt is not strictly positive. To this end, we evaluate 

£ RTi (£ e ) = (£ e ,L(eM-L*(l Ti \\\ n \\)))) 
= (L*(4),exp(-L*(4||A T J|)))) 

= J trace(L*(4)exp(-L*(^ ■ ||A T .||)))^ 

= f trace(L*(4)(exp(-L*(£ T J)) l|A ^ 11 ^. 
Js 

For each value of 9, (exp(— L* (£ n )))^ XT ^ tends to zero outside the null space of L*(£ e ). Since S is 
compact, the integrand goes to zero uniformly in 9 as i — > oo. Therefore, £r t (4) —> as well, and 
i?G" int(/C). " " " ■ 

C. Non-equispaced arrays (cont.) 

We continue with Example III-AI We begin with a "true" density p true shown in Figure |3 and generate 
covariance samples R. This "true" density does not need to be in any particular form — computation of R 
is done via numerical integration. 

Next, we integrate d^t and (071) taking A = [l 0] , and display in Figure El the resulting 
Pex P (Aoo, 0) and p rat (A 00 , 9), for comparison. Both are constructed using the fixed point of the correspond- 
ing differential equations. The rate of convergence is the same, while the distance of the starting choice 
(for the same Ao) may be different — as is the case here (with nt cxp corresponding to the y-axis to the left 
and flT ra t the labeling to the right in subplot (2, 1)). 

IV. The complete set of positive solutions 

Reference [31] suggested that all positive solutions to the moment problem may be obtained as 
minimizers of a suitable entropy functional, e.g., as being 

argmin{S(cr ||p) : R = L(p)} (39) 

with er thought of as a parameter. This was carried out successfully in [31] and [30] for the case where 
density functions are scalar-valued, for different levels of generality. Naturally, certain complications arise 
in the matricial setting. We discuss this next in the context of constant p, a as in Section IIII-AI) . The 
generalization to the non-constant case is straightforward and a positive result is given for the general 
case. 
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Fig. 2. 

Considering the Lagrangian and the stationarity conditions for O^t we arive at 

d£(\,p;5) = trace(-5M p " 1 ((x)) + (L*(\),5) 
= trace {-5M~\a) + 8L\\)) , 

leading to 

m;V) = l*(A). 

Although the "parameter" a can be readily expressed as M P (L*(X)), the density p which we are interested 
in, cannot be expressed in any effective way as a function of a and the dual variable A. Thus, a convenient 
functional form for the minimizer of (0^1) is unkown. 

The option of minimizing S(p||cr) subject to R = L(p) however, goes through. Analysis of the 
corresponding Lagrangian readily leads to 

p=-exp(log(t7)-r(A)). 

e 

A computational theory, following the lines of Sections IIII-A.2I and IIII-B.2I easily carries through. 

An attractive third alternative originates in the observation that the geometry of the problem, throughout, 
was inherited by the definiteness of the Jacobian maps. This suggests to forgo an explicit form for the 
entropy functional and start instead with a computable Jacobian. To this end we consider 

K ■ A i-> L(cr 1/2 L*(A)~V 1/2 ), and 

K a : A^L(a 1 / 2 iexp(-L*(A)) ( x 1 / 2 ). 

e 

The respective Jacobians are 

Vh a \ x : 5 ^ L(a 1 / 2 L*(\y 1 L*(5)L*(X)- 1 a 1 / 2 ), and 

Vn a \ x : 5^^L(a^ 2 M eM ^ (x)) (-L*(5))a^ 2 ). 

They are both sign definite as before and, almost verbatim, we can replicate the conclusions of Theorems 
|4] and These are combined into the following statement. 
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Theorem 6: Let R G int(/C) and a e XXl + . If dim(S) = 1, condition holds, and A G int(/C+), 
then the solution to 

j\t = {Vh a \ Xt )- l {R-K{\ t )) (40) 

remains in /C^. ual for t > and as t — > oo converges to a unique value A r g /C+ ual such that 
R = h a (\ r ). On the other hand, for any A gSR the solution to 

j t \t = (VK a \ Xt )- 1 {R-K a {\ t )) (41) 

remains bounded for t > and as t -> oo converges to a unique value A c g /Cf ual such that 
i? = K a (X e ). In case R G" int(/C), then (|4T) diverges. In case R G" int(/C) and dim(S) = 1, then (|40j 
diverges as well. 

The importance of recasting Theorems |4] and |5] as above, by incorporating arbitrary cr's in allows 
obtaining any density function which is consistent with the data R by such a procedure. To see this note 
that, if p consistent with the data, then working backwards we can select a accordingly so that p equals 
a l / 2 L*(\)- l a 1 / 2 or ia 1 / 2 exp(-L*(A))a 1 / 2 for any A (in /C^ ual and JR, respectively). Thus, Theorem 
gives descriptions of all positive densities that are consistent with the data R — simply choose the "correct" 
a. 

A potentially important application is when prior information may dictate a choice of a. In this case, 
using Theorem |^1 we may obtain an admissible density function which is "closer to our expectations." We 
amplify this remark by reworking Example III-AI with a suitable weight. 



A. Non-equispaced arrays (cont.) 

Figure |3] compares the "true" density function p t ruc which was used to generate the moments, and a 
density p rat = a 1//2 L*(A)o" 1//2 which is computed according to Theorem The original density p true has 
discontinuous peak at about 9 ~ 0.35. Then a has been selected so as to be > 1 in the neighborhood 
of 9 ~ 0.35 — actually centered about 0.25. (The accuracy of the "match" does not seem critical.) The 
density p rat is seen to be a better match as compared with the "unweighted" case of Figure |2| Subplot 
(2,1) shows the value of \\Ri — R t \\ as before, and highlights the fact that, again, p rat is consistent with 
the moments. Since \oG(9) = 1, if we choose a = p true (using 100% hindsight), we obviously obtain a 
perfect match as explained above. 
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B. State-statistics and analytic interpolation with degree constraint (cont.) 

In Section HVl the map h a can be replaced by 

A i-> L(pL*(A) - V) 

where a = (ftp* is a factorization of a with <p not necessarily Hermitian, with the obvious modifications 
in the expression for the corresponding Jacobian. The statement of the theorem holds with no changes. 
The same applies to k„ which can also be cast with respect to an arbitrary factorization of a — but this 
will not concern us here. Instead, we consider the setting of Section III-DI where 

L* : \^B*{I-e- j9 A*)- 1 X{I-e j0 A)- x B. 

If we take (p(z) = I + C Q z(I — zA)~ l B so that y? -1 is also analytic in O (which corresponds to C Q chosen 
so that A — BC is a Hurwitz matrix), then the resulting density function 

P {6) = vL*(\yv 

= (G (e^)*AG (e^))-\ 

with G (z) = (I — z(A — BC ))~ 1 B. This is a rational spectral density of degree at most twice the 
dimension of ©, and hence, it gives rise to a positive-real interpolant F as in Q of McMillan degree at 
most equal to the dimension of ©. 

V. Concluding remarks 

We presented an approach for constructing matrix-valued density functions which are consistent with 
given moments. Section HVl describes, in the spirit of the mathematical theory on the moment problem, 
all positive-definite density functions which are consistent with the data. The non-parametric description 
given in Section ITVl (non-parametric since it amounts to an arbitrary choice of a weight-density a) should 
prove useful in case we wish to incorporate prior information (e.g., subsection IIV-A1 and cf. [31]). 

The basic problem of characterizing admissibilitiy of a matricial moment R has been cast in terms 
of the positivity of a suitable functional, <£ R , in complete analogy with the classical case [42]. However, 
testing for positivity of such a functional is not a trivial matter. In the classical theory, the "shift" structure 
of the space of integration kernels ([42], [66], [1], [2]) allows a simple description of all positive elements 
in their span, via "sums of squares." This is not the case here. Instead, we determine admissibility of R 
from the convergence of the differential equation given in e.g., Theorems Yet, a more direct analog 
of the Pick operator and a corresponding test that would allow a "certificate of positivity of €.r" would 
be highly desirable. 

The present work has been influenced by recent literature on "moments with complexity constraints" 
[11], [12], [6], [20], [27], [26], [14], and in particular by Byrnes, Gusev and Lindquist [13] who first 
exploited entropy functionals in such a context. Interpolation or moment problems with degree constraint 
seek to parametrize solutions of bounded degree within the "rational familiy." The "trigonometric moment 
problem with degree constraint" was first studied in [23] in both the scalar and the multivariable setting 
(via degree theory and homotopy [23, page 76, and Chapters IV and V]). All subsequent literature 
on "complexity constraints" focused on scalar problems until Blomqvist etal. [6] who study matricial 
Nevanlinna-Pick interpolation via minimizers of an entropy functional. The framework of the present 
work, which is also based on entropy functionals, when specialized to analytic interpolation, allows 
dealing with the most general tangential (and bi-tangential, cf. [4], [18]) Caratheodory-Nevanlinna-Pick 
problems with degree constraint. "Tangential interpolation" refers to the case V(z) in e.g., (II 1H12L is a 
matricial inner factor as opposed to simply a scalar-inner times the identity. This is examplified in Section 
I1V-BI While the main focus of the present work remains the general moment problem, consequences of 
the theory as in Section ITV-BI should prove useful in multivariable feedback design with degree constraints 
[32]. 



21 



Acknowledgment 

The author would like to thank Pablo Parillo for his input, and Laurent Baratchart for helpful discussions 
during a 2003 stay at the Mittag-Leffler Institute in Sweden. 

VI. Appendix: Matrix calculus 

We assemble a number of basic mathematical formulae. These are expressions for the differential of the 
matrix exponential and the matrix logarithm that have been used in the physics literature and in quantum 
information theory. 

A. The matrix exponential 

We begin with the differential of the matrix exponential (see [21], [53]). Following [36, page 164], 
integrate both sides of 

d_ r -tA e t(A+eBl = e ~tA eBe t(A+eB) 

dt 1 J 



between and t to obtain 



e -tA e t(A+eB) _ j = ^ e -tiA eBetl (A + eB) dti 

Jo 



Then 



e t(A+eB) = e tA + e tA 



x I e hA + e hA 



f e-^eBe^^dh 
Jo 

f 

Jo 

£ e-^ A eBe^ A+ ^dt 2 ^ dh 



e tA + e tA t e- tlA eB x 



where 



= So(t) + eS 1 {t) + e 2 S 2 (t) + ... 
S (t) = e tA (42) 



S 1 (t) = e tA [ e- uA Be tiA dh 
Jo 

S 2 (t) = e tA [ f 1 e-^ A Be^-^ A Be t2A dt 2 dt 1: 
Jo Jo 



and the general term S n (t) is 

ft ft\ rt„-i 

e tA / . . . / (e- tlA Be tlA ) . . . {e^ A Be l - A )dt n ...dh. 
Jo Jo Jo 

We are only interested in the first two terms of this convergent series. 
Evaluating at t — 1 and replacing eB by A, we obtain 



e 



A+A -e A = f e^ A Ae TA dr + o(\\A\\). 
Jo 



Hence, the differential in the direction A (often refered to as Gateaux, or polar, or Frechet) is given by 
the linear map 

e (l-r)A Ae rA dT _ 
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This map represents a "scrambled" multiplication of A by e A . To see this assume that A and A commute. 
Then the right hand side becomes simple e A A. 

The SVterm in (1421l gives the quadratic term in A in the expansion of e A+A — e A as 

(e (1 - Tl)A Ae TlA J (e- AT1T2 Ae T1T2A ) dr^j r x dr x . 

In general, for Hermitian matrices C and A, and C > 0, define the "non-commutative" or "scrambled" 
multiplication of A by C via the operator 



M c : Ai-> / C (1 - r) AC r dr. (43) 



o 



This gives a compact expression for the differential of e A , summarized below. 
Proposition 7: The differential of exp(A) ■= e A is M £ a. 

B. The matrix logarithm 

We now turn to the matrix logarithm. Integrate both sides of 

j t [log(/ + tP) - log(/ + tQ)] = (/ + tP^P - Q(J + tQY 1 
between and 1 to obtain that 

log(J + P) - log(J + Q) = [ {I + tQ)-\P-Q){I + tP)- l dt 

Jo 

assuming that B := I + P > and that A := I + Q >. Rewrite this expression in terms of A and B and 
change the integration variable to r = to obtain 

poo 

log(S) - log(A) = / (A + Tl) -1 (fl- ^(P+rl)" 1 ^. 

If £ = A + A, then 

(A + riy 1 - (A + A + r/)- 1 = (A + r/) _1 A(>4 + A + r/)" 1 
which, for A, A Hermitian, A > and A + A > 0, leads to 



/>oo 

log(A + A) = log(A)+ / (A + Tl)- x k{A + Tiy l dT 

Jo 

poo 

+ / {A + rl^AiA + Tl^AiA + Tiy^dr 
Jo 



(44) 



+ o(||A|| 2 ). 

By expanding in terms of eigenvectors of A it can be verified that 

POO 

/ {A + rI)- 1 A{A + Tl)- l dT = M A \A). 
Jo 

Indeed, if A is diagonal^, . . . , a n } then the (i, j)th entry of 

M A ^ jf (A + rI) _1 A(^ + rI) _1 dT 
is simply 

"1 POO 

af'^^dt \ (en + r)- 1 {a j + r)- 1 dr(A) ii 



(45) 
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where (A) i3 - is the (i, j)th entry of A (in this same basis where A is diagonal). Then 

[ 1 a?- t) * t j dt= . ( a T a ' , 
Jo J log(ai) - log(%) 

whereas, / °°(ai + r) _1 (aj • + r) _1 dT is the inverse of the same expression. This result is attributed to Lieb 
(see [65, page 4]) and summarized below. 
Proposition 8: The differential of \og(A) is MJ 1 . 
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